In joepowers16/rethinking: Statistical Rethinking book package

Overview:

I think the single biggest insight of the past 2 days is the dnorm() does not output a probability value that x will take on a particular value. Rather dnorm() outputs the height of the density curve at a particular value of x. This height can be used later to calculate a probability for a range of x values (e.g., from -1 to 1), but in the short term dnorm() just outputs a y-value for each x within a normal distribution with a given mean and sd. Narrower sd leads to higher y values near the mean as a greater density of probability will be concentrated near the mean for distributions with little variance.

library(knitr)
library(tidyverse)

source(here::here("file_paths.R"))

file_normal_pdf <- paste(dir_images, "normal_pdf.png", sep = "/")

runif(n, min, max)

dunif(x, min, max) will output the y-axis height at x of the uniform distribution between min and max.

dunif(x = c(.25, .5), min = 0, max = 1)

But note that this y-axis output from the uniform density function is not the probability that the random variable will equal x, it is just the height at the y-axis when the random variable equals x. Further computation is necessary to determine probabilities of x taking on specified values, and those values must be specified as a range when x is continuous.

Note that the y-axis height associated with any x value is equal to 1 / (the range of the uniform distribution).

dunif(x = 10, min = 0, max = 120)
dunif(x = 10.5, min = 0, max = 120)
1/120

a <- 0
b <- 0.5

punif(b) - punif(a)

# Probability computed using the integrate() function of the pdf dunif()
integrate(
  f = dunif, 
  lower = 0.1, 
  upper = 0.5
)

First Google hit with code

U of Arizona site says that dunif() is really only useful for graphing, not for directly calculating probabilities.

In his example below I see that dunif() works even though x is undefined.

curve( 
  expr = dunif(x , min = 2 , max = 6), 
  from = 0, to = 8,
  ylim = c(0 , 0.5), 
  ylab = "f(x)", 
  main = "Uniform Density f(x)"
)

Normal Distributions

tibble(x = seq(from = 0, to = 100, by = .1)) %>% 
  ggplot(aes(x = x, y = dnorm(x, mean = 50, sd = 15))) +
  geom_line() +
  ylab("density")

Here dnorm() takes the 150^10 data points in x and distributes probability across all those points such that plotting y on x produces a normal distribution.

x <- seq(from = 0, to = 100, by = 1)
dnorm(x, 50, 15)
dnorm(x, 50, 15) %>% length()

# Notice how the sd of the distribution increases the max y value. 
dnorm(x, 50, 15) %>% max()
dnorm(x, 50, 1) %>% max()

The above code demonstrates how there are several moving parts of interest. x is just a vector of values. dnorm() will tell us the y-value associated with each of these x-values on a probability curve with a given mean and sd. Narrower sd leads to higher y values near the mean as a greater density of probability will be concentrated near the mean for distributions with little variance.

tibble(x = seq(from = 0, to = 100, by = 1)) %>% 
  ggplot(aes(x = x, y = dnorm(x, mean = 50, sd = 15))) +
  geom_line() +
  ylab("density")

knitr::include_graphics(file_normal_pdf)

The probability density function (PDF) outputs the height of the y-axis on the probability density curve at the provided x value. The PDF does NOT output the probability that a random variable will take on a particular value of x. To calculate the probability of x taking on a values can only be calculated as x falling within a range of values, because the probabillity of a continuous random variable taking on a specific value is 1/inf.

In contrast, the binomial probability distribution has a probability mass function (PMF) rather than a density function because it's outputs are discrete.

# outputs Pr(1 head in 1 flip)
success_n <- 1 
trials_n <- 1
prob_success <- .5
dbinom(success_n, trials_n, prob_success)

# outputs probability of 5 heads in 10 flips
success_n <- 5 
trials_n <- 10
prob_success <- .5
dbinom(success_n, trials_n, prob_success)